Skip to content

fix: workload identity token refresh issue #2071

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 17, 2025

Conversation

andyzhangx
Copy link
Member

@andyzhangx andyzhangx commented Jul 11, 2025

What type of PR is this?
/kind bug

What this PR does / why we need it:
fix: Workload Identity token refresh issue

Which issue(s) this PR fixes:

Fixes #1987

Requirements:

Special notes for your reviewer:

mkdir /tmp/test
export AZURE_STORAGE_ACCOUNT=andyblobtest
export AZURE_STORAGE_SPN_CLIENT_ID=34719cde-3ea8-489e-8850-c0a4fd820348
export AZURE_OAUTH_TOKEN_FILE=/var/lib/kubelet/plugins/blob.csi.azure.com/34719cde-3ea8-489e-8850-c0a4fd820348
export AZURE_STORAGE_SPN_TENANT_ID=72f988bf-86f1-41af-91ab-2d7cd011db47
blobfuse2 /tmp/test --container-name=test --tmp-path=/tmp/blobfuse -o allow_other --file-cache-timeout-in-seconds=120

I0711 12:04:17.258691 2784585 utils.go:104] GRPC call: /csi.v1.Node/NodePublishVolume
I0711 12:04:17.258718 2784585 utils.go:105] GRPC request: {"staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/60303ae22b998861bce3b28f33eec1be758a213c86c93c076dbe9f558c11c752/globalmount","target_path":"/var/lib/kubelet/pods/739716e5-db0f-450d-ac59-10576a9386d5/volumes/kubernetes.io~csi/pv-blob/mount","volume_capability":{"AccessType":{"Mount":{"mount_flags":["-o allow_other","--file-cache-timeout-in-seconds=120"]}},"access_mode":{"mode":5}},"volume_context":{"clientID":"34719cde-3ea8-489e-8850-c0a4fd820348","containerName":"test","csi.storage.k8s.io/ephemeral":"false","csi.storage.k8s.io/pod.name":"nginx-blob2","csi.storage.k8s.io/pod.namespace":"default","csi.storage.k8s.io/pod.uid":"739716e5-db0f-450d-ac59-10576a9386d5","csi.storage.k8s.io/serviceAccount.name":"andyblob","csi.storage.k8s.io/serviceAccount.tokens":"***stripped***","mountWithWorkloadIdentityToken":"true","storageAccount":"andyblobtest"},"volume_id":"test2"}
I0711 12:04:17.259089 2784585 nodeserver.go:84] NodePublishVolume: volume(test2) mount on /var/lib/kubelet/pods/739716e5-db0f-450d-ac59-10576a9386d5/volumes/kubernetes.io~csi/pv-blob/mount with service account token, clientID: 34719cde-3ea8-489e-8850-c0a4fd820348
I0711 12:04:17.259367 2784585 blob.go:474] parsing volumeID(test2) return with error: error parsing volume id: "test2", should at least contain two #
I0711 12:04:17.259416 2784585 blob.go:560] volumeID(test2) authEnv: []
I0711 12:04:17.259445 2784585 blob.go:585] clientID(34719cde-3ea8-489e-8850-c0a4fd820348) is specified, use workload identity for blobfuse auth
I0711 12:04:17.259529 2784585 blob.go:592] write workload identity token to /var/lib/kubelet/plugins/blob.csi.azure.com/34719cde-3ea8-489e-8850-c0a4fd820348
I0711 12:04:17.259656 2784585 nodeserver.go:434] target /var/lib/kubelet/pods/739716e5-db0f-450d-ac59-10576a9386d5/volumes/kubernetes.io~csi/pv-blob/mount protocol  volumeId test2
mountflags [-o allow_other --file-cache-timeout-in-seconds=120]
mountOptions [-o allow_other --file-cache-timeout-in-seconds=120 --use-https=true --cancel-list-on-mount-seconds=10 --empty-dir-check=false --tmp-path=/mnt/test2 --container-name=test --pre-mount-validate=true] volumeMountGroup
args /var/lib/kubelet/pods/739716e5-db0f-450d-ac59-10576a9386d5/volumes/kubernetes.io~csi/pv-blob/mount -o allow_other --file-cache-timeout-in-seconds=120 --use-https=true --cancel-list-on-mount-seconds=10 --empty-dir-check=false --tmp-path=/mnt/test2 --container-name=test --pre-mount-validate=true
serverAddress andyblobtest.blob.core.windows.net
I0711 12:04:17.259712 2784585 nodeserver.go:166] start connecting to blobfuse proxy, protocol: , args: /var/lib/kubelet/pods/739716e5-db0f-450d-ac59-10576a9386d5/volumes/kubernetes.io~csi/pv-blob/mount -o allow_other --file-cache-timeout-in-seconds=120 --use-https=true --cancel-list-on-mount-seconds=10 --empty-dir-check=false --tmp-path=/mnt/test2 --container-name=test --pre-mount-validate=true
I0711 12:04:17.261800 2784585 nodeserver.go:184] begin to mount with blobfuse proxy, protocol: , args: /var/lib/kubelet/pods/739716e5-db0f-450d-ac59-10576a9386d5/volumes/kubernetes.io~csi/pv-blob/mount -o allow_other --file-cache-timeout-in-seconds=120 --use-https=true --cancel-list-on-mount-seconds=10 --empty-dir-check=false --tmp-path=/mnt/test2 --container-name=test --pre-mount-validate=true
I0711 12:04:18.153054 2784585 mount_linux.go:324] Detected umount with safe 'not mounted' behavior
I0711 12:04:18.153171 2784585 nodeserver.go:709] blobfuse mount at /var/lib/kubelet/pods/739716e5-db0f-450d-ac59-10576a9386d5/volumes/kubernetes.io~csi/pv-blob/mount success
I0711 12:04:18.153195 2784585 nodeserver.go:492] volume(test2) mount on "/var/lib/kubelet/pods/739716e5-db0f-450d-ac59-10576a9386d5/volumes/kubernetes.io~csi/pv-blob/mount" succeeded
I0711 12:04:18.153278 2784585 azure_metrics.go:105] "Observed Request Latency" latency_seconds=0.894112372 request="blob_csi_driver_node_stage_volume" resource_group="mc_andy-aks133_andy-aks133_eastus2euap" subscription_id="" source="blob.csi.azure.com" volumeid="test2" result_code="succeeded"
I0711 12:04:18.153296 2784585 utils.go:111] GRPC response: {}
I0711 12:04:21.582936 2784585 utils.go:104] GRPC call: /csi.v1.Node/NodePublishVolume
I0711 12:04:21.582956 2784585 utils.go:105] GRPC request: {"staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/60303ae22b998861bce3b28f33eec1be758a213c86c93c076dbe9f558c11c752/globalmount","target_path":"/var/lib/kubelet/pods/739716e5-db0f-450d-ac59-10576a9386d5/volumes/kubernetes.io~csi/pv-blob/mount","volume_capability":{"AccessType":{"Mount":{"mount_flags":["-o allow_other","--file-cache-timeout-in-seconds=120"]}},"access_mode":{"mode":5}},"volume_context":{"clientID":"34719cde-3ea8-489e-8850-c0a4fd820348","containerName":"test","csi.storage.k8s.io/ephemeral":"false","csi.storage.k8s.io/pod.name":"nginx-blob2","csi.storage.k8s.io/pod.namespace":"default","csi.storage.k8s.io/pod.uid":"739716e5-db0f-450d-ac59-10576a9386d5","csi.storage.k8s.io/serviceAccount.name":"andyblob","csi.storage.k8s.io/serviceAccount.tokens":"***stripped***","mountWithWorkloadIdentityToken":"true","storageAccount":"andyblobtest"},"volume_id":"test2"}
I0711 12:04:21.583221 2784585 nodeserver.go:84] NodePublishVolume: volume(test2) mount on /var/lib/kubelet/pods/739716e5-db0f-450d-ac59-10576a9386d5/volumes/kubernetes.io~csi/pv-blob/mount with service account token, clientID: 34719cde-3ea8-489e-8850-c0a4fd820348
I0711 12:04:21.602318 2784585 nodeserver.go:678] already mounted to target /var/lib/kubelet/pods/739716e5-db0f-450d-ac59-10576a9386d5/volumes/kubernetes.io~csi/pv-blob/mount
I0711 12:04:21.602368 2784585 blob.go:474] parsing volumeID(test2) return with error: error parsing volume id: "test2", should at least contain two #
I0711 12:04:21.602423 2784585 blob.go:560] volumeID(test2) authEnv: []
I0711 12:04:21.602472 2784585 blob.go:585] clientID(34719cde-3ea8-489e-8850-c0a4fd820348) is specified, use workload identity for blobfuse auth
I0711 12:04:21.602572 2784585 blob.go:592] write workload identity token to /var/lib/kubelet/plugins/blob.csi.azure.com/34719cde-3ea8-489e-8850-c0a4fd820348
I0711 12:04:21.602850 2784585 nodeserver.go:337] NodeStageVolume: volume test2 is already mounted on /var/lib/kubelet/pods/739716e5-db0f-450d-ac59-10576a9386d5/volumes/kubernetes.io~csi/pv-blob/mount
I0711 12:04:21.602947 2784585 azure_metrics.go:105] "Observed Request Latency" latency_seconds=0.019627693 request="blob_csi_driver_node_stage_volume" resource_group="mc_andy-aks133_andy-aks133_eastus2euap" subscription_id="" source="blob.csi.azure.com" volumeid="test2" result_code="failed_csi_driver_node_stage_volume"
I0711 12:04:21.602984 2784585 utils.go:111] GRPC response: {}
I0711 12:04:22.590434 2784585 utils.go:104] GRPC call: /csi.v1.Node/NodePublishVolume
I0711 12:04:22.590449 2784585 utils.go:105] GRPC request: {"staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/60303ae22b998861bce3b28f33eec1be758a213c86c93c076dbe9f558c11c752/globalmount","target_path":"/var/lib/kubelet/pods/739716e5-db0f-450d-ac59-10576a9386d5/volumes/kubernetes.io~csi/pv-blob/mount","volume_capability":{"AccessType":{"Mount":{"mount_flags":["-o allow_other","--file-cache-timeout-in-seconds=120"]}},"access_mode":{"mode":5}},"volume_context":{"clientID":"34719cde-3ea8-489e-8850-c0a4fd820348","containerName":"test","csi.storage.k8s.io/ephemeral":"false","csi.storage.k8s.io/pod.name":"nginx-blob2","csi.storage.k8s.io/pod.namespace":"default","csi.storage.k8s.io/pod.uid":"739716e5-db0f-450d-ac59-10576a9386d5","csi.storage.k8s.io/serviceAccount.name":"andyblob","csi.storage.k8s.io/serviceAccount.tokens":"***stripped***","mountWithWorkloadIdentityToken":"true","storageAccount":"andyblobtest"},"volume_id":"test2"}
I0711 12:04:22.590678 2784585 nodeserver.go:84] NodePublishVolume: volume(test2) mount on /var/lib/kubelet/pods/739716e5-db0f-450d-ac59-10576a9386d5/volumes/kubernetes.io~csi/pv-blob/mount with service account token, clientID: 34719cde-3ea8-489e-8850-c0a4fd820348
I0711 12:04:22.610386 2784585 nodeserver.go:678] already mounted to target /var/lib/kubelet/pods/739716e5-db0f-450d-ac59-10576a9386d5/volumes/kubernetes.io~csi/pv-blob/mount
I0711 12:04:22.610409 2784585 blob.go:474] parsing volumeID(test2) return with error: error parsing volume id: "test2", should at least contain two #
I0711 12:04:22.610450 2784585 blob.go:560] volumeID(test2) authEnv: []
I0711 12:04:22.610467 2784585 blob.go:585] clientID(34719cde-3ea8-489e-8850-c0a4fd820348) is specified, use workload identity for blobfuse auth
I0711 12:04:22.610563 2784585 blob.go:592] write workload identity token to /var/lib/kubelet/plugins/blob.csi.azure.com/34719cde-3ea8-489e-8850-c0a4fd820348
I0711 12:04:22.610755 2784585 nodeserver.go:337] NodeStageVolume: volume test2 is already mounted on /var/lib/kubelet/pods/739716e5-db0f-450d-ac59-10576a9386d5/volumes/kubernetes.io~csi/pv-blob/mount
I0711 12:04:22.610799 2784585 azure_metrics.go:105] "Observed Request Latency" latency_seconds=0.020078951 request="blob_csi_driver_node_stage_volume" resource_group="mc_andy-aks133_andy-aks133_eastus2euap" subscription_id="" source="blob.csi.azure.com" volumeid="test2" result_code="failed_csi_driver_node_stage_volume"
I0711 12:04:22.610825 2784585 utils.go:111] GRPC response: {}
I0711 12:05:35.291563 2784585 utils.go:104] GRPC call: /csi.v1.Node/NodePublishVolume
I0711 12:05:35.291579 2784585 utils.go:105] GRPC request: {"staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/60303ae22b998861bce3b28f33eec1be758a213c86c93c076dbe9f558c11c752/globalmount","target_path":"/var/lib/kubelet/pods/739716e5-db0f-450d-ac59-10576a9386d5/volumes/kubernetes.io~csi/pv-blob/mount","volume_capability":{"AccessType":{"Mount":{"mount_flags":["-o allow_other","--file-cache-timeout-in-seconds=120"]}},"access_mode":{"mode":5}},"volume_context":{"clientID":"34719cde-3ea8-489e-8850-c0a4fd820348","containerName":"test","csi.storage.k8s.io/ephemeral":"false","csi.storage.k8s.io/pod.name":"nginx-blob2","csi.storage.k8s.io/pod.namespace":"default","csi.storage.k8s.io/pod.uid":"739716e5-db0f-450d-ac59-10576a9386d5","csi.storage.k8s.io/serviceAccount.name":"andyblob","csi.storage.k8s.io/serviceAccount.tokens":"***stripped***","mountWithWorkloadIdentityToken":"true","storageAccount":"andyblobtest"},"volume_id":"test2"}
I0711 12:05:35.291855 2784585 nodeserver.go:84] NodePublishVolume: volume(test2) mount on /var/lib/kubelet/pods/739716e5-db0f-450d-ac59-10576a9386d5/volumes/kubernetes.io~csi/pv-blob/mount with service account token, clientID: 34719cde-3ea8-489e-8850-c0a4fd820348
I0711 12:05:35.313091 2784585 nodeserver.go:678] already mounted to target /var/lib/kubelet/pods/739716e5-db0f-450d-ac59-10576a9386d5/volumes/kubernetes.io~csi/pv-blob/mount
I0711 12:05:35.313171 2784585 blob.go:474] parsing volumeID(test2) return with error: error parsing volume id: "test2", should at least contain two #
I0711 12:05:35.313203 2784585 blob.go:560] volumeID(test2) authEnv: []
I0711 12:05:35.313253 2784585 blob.go:585] clientID(34719cde-3ea8-489e-8850-c0a4fd820348) is specified, use workload identity for blobfuse auth
I0711 12:05:35.313346 2784585 blob.go:592] write workload identity token to /var/lib/kubelet/plugins/blob.csi.azure.com/34719cde-3ea8-489e-8850-c0a4fd820348
I0711 12:05:35.313556 2784585 nodeserver.go:337] NodeStageVolume: volume test2 is already mounted on /var/lib/kubelet/pods/739716e5-db0f-450d-ac59-10576a9386d5/volumes/kubernetes.io~csi/pv-blob/mount

Release note:

fix: Workload Identity token refresh issue

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jul 11, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andyzhangx

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 11, 2025
@k8s-ci-robot k8s-ci-robot requested review from cvvz and feiskyer July 11, 2025 08:01
@andyzhangx andyzhangx requested review from Copilot and removed request for feiskyer and cvvz July 11, 2025 08:01
@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jul 11, 2025
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates the blob CSI driver to always refresh Workload Identity tokens by writing them to a file, adjusts the mount logic in NodeStageVolume to allow token refresh on already-mounted volumes, and adds related deployment settings.

  • Always call GetAuthEnv and write a federated token file even if the volume is already staged.
  • Ensure the test creates the expected token directory before running.
  • Expose new CSIDriver fields requiresRepublish and expirationSeconds for token rotation.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
pkg/blob/nodeserver_test.go Create directory for token file in test setup
pkg/blob/nodeserver.go Reorder mount check and error handling around GetAuthEnv
pkg/blob/blob.go Write the workload identity token to a file on the node
deploy/csi-blob-driver.yaml Add requiresRepublish and expirationSeconds under spec
Comments suppressed due to low confidence (1)

deploy/csi-blob-driver.yaml:13

  • Kubernetes CSIDriver spec uses requiresRepublish (lower camelCase) rather than RequiresRepublish. Update the field name to requiresRepublish to match the API schema.
  RequiresRepublish: true

@andyzhangx andyzhangx force-pushed the wi-token-refresh branch 2 times, most recently from c588551 to fb0e43b Compare July 12, 2025 03:39
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jul 12, 2025
@andyzhangx andyzhangx changed the title [WIP] fix: Workload Identity token refresh issue fix: Workload Identity token refresh issue Jul 12, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 12, 2025
@andyzhangx andyzhangx changed the title fix: Workload Identity token refresh issue [WIP] fix: Workload Identity token refresh issue Jul 13, 2025
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 13, 2025
@andyzhangx andyzhangx changed the title [WIP] fix: Workload Identity token refresh issue fix: Workload Identity token refresh issue Jul 16, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 16, 2025
fix

fix

fix

fix

fix

fix

fix
@andyzhangx andyzhangx changed the title fix: Workload Identity token refresh issue fix: workload identity token refresh issue Jul 16, 2025
@andyzhangx andyzhangx merged commit 77dafc1 into kubernetes-sigs:master Jul 17, 2025
21 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Workload identities using managed identity should renew its token before expiration. Causes outage.
2 participants